Count data in biology—Data transformation or model reformation?

نویسندگان

  • Anne P St-Pierre
  • Violaine Shikon
  • David C Schneider
چکیده

Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from data transformation toward model reformation-the use of non-normal error structures within the framework of the generalized linear model (GLM). The principal advantage of model reformation is that parameters are estimated on the original, rather than the transformed scale. However, data transformation has been shown to give better control over type I error, for simulated data with known error structures. We conducted a literature review of statistical textbooks directed toward biologists and of journal articles published in the primary literature to determine temporal trends in both the text recommendations and the practice in the refereed literature over the past 35 years. In this review, a trend of increasing use of reformation in the primary literature was evident, moving from no use of reformation before 1996 to >50% of the articles reviewed applying GLM after 2006. However, no such trend was observed in the recommendations in statistical textbooks. We then undertook 12 analyses based on published datasets in which we compared the type I error estimates, residual plot diagnostics, and coefficients yielded by analyses using square root transformations, log transformations, and the GLM. All analyses yielded acceptable residual versus fit plots and had similar p-values within each analysis, but as expected, the coefficient estimates differed substantially. Furthermore, no consensus could be found in the literature regarding a procedure to back-transform the coefficient estimates obtained from linear models performed on transformed datasets. This lack of consistency among coefficient estimates constitutes a major argument for model reformation over data transformation in biology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation of Count Data using Bivariate Negative Binomial Regression Models

Abstract Negative binomial regression model (NBR) is a popular approach for modeling overdispersed count data with covariates. Several parameterizations have been performed for NBR, and the two well-known models, negative binomial-1 regression model (NBR-1) and negative binomial-2 regression model (NBR-2), have been applied. Another parameterization of NBR is negative binomial-P regression mode...

متن کامل

An alternative transformation in ranking using l1-norm in data envelopment analysis

Jahanshahloo et al. (Appl Math Comput 153:215–224, 2004) propose a method for ranking extremely efficient decision making units (DMUs) in data envelopment analysis (DEA) using super-efficiency technique and l1-norm and they show that the presented method is able to eliminate the existing difficulties in some methods. This paper suggests an alternative transformation to convert the nonlinear mod...

متن کامل

Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals

BACKGROUND AND OBJECTIVES: The classification of marine animals as protected species makes data and information on them to be very important. Therefore, this led to the need to retrieve and understand the data on the event counts for stranded marine animals based on location emergence, number of individuals, behavior, and threats to their presence. Whales are g...

متن کامل

Fitting of Count Time Series Models on the Number of Patients Referred to Addiction Treatment Centers in Semnan County

Abstract. Count data over time are observed in many application areas. Many researchers use time series patterns to analyze this data. In this paper, the poisson count time series linear models and negative binomials on this type of data with the explanatory variables are studied. The Likelihood analysis and the evaluation of count time series model based on generalized linear models are pres...

متن کامل

Hurdle, Inflated Poisson and Inflated Negative Binomial Regression Models ‎ for Analysis of Count Data with Extra Zeros

In this paper‎, ‎we ‎propose ‎Hurdle regression models for analysing count responses with extra zeros‎. A method of estimating maximum likelihood is used to estimate model parameters. The application of the proposed model is presented in insurance dataset‎. In this example‎, there are many numbers of claims equal to zero is considered that clarify the application of the model with a zero-inflat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2018